ECON122 - Stop and Frisk Project

Data Science and Statistical Learning Course Final Project.

Seungho (Samuel) Lee (Claremont McKenna College) , Priyanka Agarwal (Harvey Mudd College)
December 20, 2019

Project Overview

Background:

The Stop, Question and Frisk program is a practice, utilized by New York City Police Department (NYPD), of temporarily halting, questioning, and, in certain cases, searching civilians on the street for weapons and other dangers. It is also called “Terry Stop,” named after the Supreme Court case Terry v. Ohio (1968).

The use of Stop, Question and Frisk practice is often endorsed with the Broken Windows Theory, suggesting that even low-level crimes and civil disorder leads to more serious crimes in urban enviornments. In fact, after NYPD officer Adrian Schoolcraft made extensive recordings on the department’s Stop and Frisk policy, numerous civil rights organizations, such as NYCLU, raised a concern that the program unfairly targest certain minorities, such as African-Americans and Hispanic-Americans.

A major turning point was the 2013 court case Floyd v. City of New York and a subsequent NYPD mandate that requires officers to thoroughly justify the reason for making a stop.[34] In 2013, 191,558 stops were made.[35]

Project Objectives:

  1. Visualize the data to get intuit

  2. Visualize the data to get intuition for aspects involved
  1. Find if there’s some effect of Bloomberg’s Policy

Methodology

Data Used

  1. Crime Analysis
  2. Policy Analysis

# Samuel's Part
SQF1718 <- read_csv("https://github.com/samuellee19/econ122_sqf/raw/master/Data%20Files%20Only/SQF1718.csv")

# Priyanka's Part
url <- "https://www.nyclu.org/sites/default/files/field_documents/2018_sqf_database.xlsx"
destfile <- "2018_sqf_database.xlsx"
curl::curl_download(url, destfile)
# Create a vector of Excel files to read
files.to.read = list.files(pattern="xlsx")

# Read each file and write it to csv
library("rio")
xls <- dir(pattern = "xlsx")
created <- mapply(convert, xls, gsub("xlsx", "csv", xls))
unlink(xls) # delete xlsx files
SQF2018 <- read.csv("2018_sqf_database.csv")


url <- "https://www1.nyc.gov/assets/nypd/downloads/excel/analysis_and_planning/stop-question-frisk/sqf-2017.xlsx"
destfile <- "2017_sqf_database.xlsx"
curl::curl_download(url, destfile)
# Create a vector of Excel files to read
files.to.read = list.files(pattern="xlsx")

# Read each file and write it to csv
library("rio")
xls <- dir(pattern = "xlsx")
created <- mapply(convert, xls, gsub("xlsx", "csv", xls))
unlink(xls) # delete xlsx files
SQF2017 <- read.csv("2017_sqf_database.csv")


url <- "https://www.nyclu.org/sites/default/files/field_documents/2016_sqf_database.xlsx"
destfile <- "2016_sqf_database.xlsx"
curl::curl_download(url, destfile)
# Create a vector of Excel files to read
files.to.read = list.files(pattern="xlsx")

# Read each file and write it to csv
library("rio")
xls <- dir(pattern = "xlsx")
created <- mapply(convert, xls, gsub("xlsx", "csv", xls))
unlink(xls) # delete xlsx files
SQF2016 <- read.csv("2016_sqf_database.csv")

url <- "https://www1.nyc.gov/assets/nypd/downloads/excel/analysis_and_planning/stop-question-frisk/sqf-2015.csv"
destfile <- "sqf-2015.csv"
curl::curl_download(url, destfile)
SQF2015 <- read.csv(destfile)

Visualizations

Effect of Stop and Frisk on Crime Rates


# Graphing standard deviation of crime count vs standard deviation number of stop and frisks by year, and then coloring by year
# No trend in data at all
ggplot(crimeAndCount, mapping = aes(x = crimeChange, y = count)) + geom_point(mapping=aes(color = year)) + geom_smooth(method="lm", color="red", aes(x = crimeChange, y = count)) 


# Seperating by year in case poilicy changes had an effect, no clear trend still
ggplot(crimeAndCount, mapping = aes(x = crimeChange, y = count)) + geom_point() + geom_smooth(method="lm", color="red", aes(x = crimeChange, y = count)) + facet_wrap(~year) 

Relationship Between Crime and Stop And Frisk


# Graphing standard deviation of crime count vs standard deviation number of stop and frisks by year, and then coloring by year
# No trend in data at all
ggplot(crimeAndCount, mapping = aes(x = crimeSD, y = count)) + geom_point(mapping=aes(color = year)) + geom_smooth(method="lm", color="red", aes(x = crimeSD, y = count)) 


# Seperating by year in case policy changes had an effect, no clear trend still
ggplot(crimeAndCount, mapping = aes(x = crimeSD, y = count)) + geom_point() + geom_smooth(method="lm", color="red", aes(x = crimeSD, y = count)) + facet_wrap(~year) 

What Factors Actually Matter: Analysis of Effects of Change on Policy

  1. Splitting it into three data sets, running 2-3 different feature based regressions to see most important variables, finding average importance across each variable type and comparing
  2. Setting a binary variable to indicate which time period it was in relative to Bloomberg, see weight of that variable given in the different regression methods.

Findings and analysis

  1. No connection between stop and frisk and changes in crime rate
  2. No connection between stop and frisk and crime of an area

Conclusion